Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add filters

Database
Language
Document Type
Year range
1.
2022 IEEE International Conference on Big Data, Big Data 2022 ; : 734-739, 2022.
Article in English | Scopus | ID: covidwho-2261441

ABSTRACT

Data profiling is a "set of statistical data analysis activities to determine properties of a dataset". Historically, it was aimed at data (not meta-data), but at scale, the tables' meta-data (i.e. title, attribute names, types) becomes abundant, hence its profiling becomes vital, especially in order to understand the contents of large-scale structured datasets.Here we describe and evaluate the algorithms and models behind our scalable Meta-data profiler. It is capable of learning Meta-profiles for a topic of interest in extreme-scale structured datasets, such as WDC [1] or CORD-19 [2] having millions of tables and hundreds of thousands of sources. A 3D Meta-profile visualizes a specific topic (e.g. COVID-19 vaccine side-effects) present in a large-scale structured dataset and simplifies access and comparison for data scientists and end-users. © 2022 IEEE.

2.
31st ACM International Conference on Information and Knowledge Management, CIKM 2022 ; : 4857-4861, 2022.
Article in English | Scopus | ID: covidwho-2108335

ABSTRACT

Accessing large-scale structured datasets such as WDC [31] or CORD-191 is very challenging [ 11, 13, 14, 41, 42]. Even if one topic (e.g. Vaccine Side-Effects) is of interest, the side-effects tables in different papers have hundreds of different schemas, depending on the authors, which significantly complicates both finding and querying them. Here we demonstrate our scalable Meta-data profiler, capable of constructing a standardized interface to a topic of interest in large-scale structured datasets. This interface, called Meta-profile represents a meta-data summary per each topic, representative of the entire dataset. Such profiles can be used as a robust visualization as well as to simplify access to structured data for both data scientists and end users at scale [32, 42] © 2022 ACM.

SELECTION OF CITATIONS
SEARCH DETAIL